State Transition Constraints

Comments 0

Share to social media

About two decades ago, I introduced the concept of transition constraints to show Data Validation in a database is a lot more complex than seeing if a string parameter really is an integer. In October of 2008, I did an article called Constraint Yourself! on how to use DDL constraints to assure data integrity. One of the topics in that piece was a look at state transition constraints via an auxiliary table.

Introduction

Let me give you some introduction, and for some of you flashbacks to your early computer science classes. There is an initial state, flow lines that show what are the next legal states, and one or more termination states. These diagrams also called finite state automata, and you’ll see it in freshman computer science classes. To date, we might implement them with a graph database. What is worth relearning are our old tricks from SQL.

1157-BornMarriedDivorcedDead.jpg

This state transition diagram was deliberately simplified, but it is good enough to explain principles. To keep the discussion as simple as possible, my table is for only one person’s marital status over his life. Here is a skeleton DDL with the needed FOREIGN KEY reference to valid state changes and the date that the current state started.

What is not shown on it are which nodes are initial states (in this case “Born”) and which are terminal or final states (in this case “Dead”, a very terminal state of being). A terminal node can be the current state of a middle node, but not a prior state. Likewise, an initial node can be the prior state of a middle node, but not the current state. I did not write any CHECK() constraints for those conditions. It is easy enough to write a quick query with an EXISTS() predicate to do this and I will leave that as an exercise for the reader. Let’s load the diagram into an auxiliary table with some more constraints.

The Time Dimension

An aspect of this problem that I did not consider in the first article is the time dimension. We want to see a temporal path from an initial state to a terminal state. State changes do not happen all at once, but are spread over time. An acorn becomes an oak tree before it becomes lumber and finally my chest of drawers. The acorn does not jump immediately to being a chest of drawers.

Some of the changes are controlled by time. For example, one cannot get married immediately after being born, they have to wait to be of legal age. A business offer can expire in a set number of days. I am sure you can think of many additional examples as well.

For a production system, you would need a more complete set of temporal columns to guarantee that we have no gaps in the history, but this will do for now. We now need a stored procedure to add data to the MyLife table. Here is one solution which is deliberately broken into clear steps for clarity.

The first block of code locates the most recent state of my life, based on the date. The second block of code will insert an initial state if the table is empty. This is a safety feature but there probably ought to be a separate procedure to create the set of initial states. The new state has to be an actual change, so there is a block of code to be sure. The changes must move forward in time. Finally, we build a row using the most recent state as the new previous state, the input change state and the date. If the state change is illegal, the FOREIGN KEY is violated, and we get an error.

If you had other business rules, you could also add them to the code in the same way. You should have noticed that if someone makes changes directly to the MyLife Table, they are pretty much free to screw the data. It is a good ideas to have a procedure that checks to see that MyLife is in order. Let’s load the table with bad data:

This poor person popped into existence without being properly born , committed bigamy and died twice. And you think your life is tough! Here is a simple validation procedure to catch those errors.

The CTE numbers the steps of the temporal path from an initial node to a middle or terminal node. This chain has to be unbroken which means going from step (n) to step(n+1) has to be a legal change in the StateChanges table. This chain can have only one initial node, so let’s check for that next. Finally, the chain is either still in progress or it has reached a single terminal node.

A little note on the programming technique used. The union of separate queries to do one validation at a time can often be made faster by combining some of the queries. But there are trade-offs; this code is easy to read and to maintain and (hopefully) it will not be run often. It is also hard to get error messages from a single statement. Look back at the ChangeState() procedure; the two IF and RAISERROR() blocks of code could have been converted into CASE expressions that will generate NULLs, folded into the INSERT INTO statement and cause the insertion to fail.

This is not easy to read or to get error messages that tell you if the @in_change_date is invalid in that it violates the time sequence.

The Temporal Side of Changes

What is still missing is the temporal aspect of state changes. In this example, the (Born’, ‘Married’) change would have to deal with the minimum age of consent. The (Married’, ‘Divorced’) change often has a legal waiting period. While technically a business rule, you know that no human being has lived over 150 years, so a gap that size is a data error. The terminal and initial states are instantaneous, however. Let’s add more flesh to the skeleton table:

To make up some data, let’s assume that the age of consent is 18 (12 months * 18 years = 216), that you have to wait 3 months into your marriage before getting a divorce, and that you must be divorced 2 months before you can re-marry. Of course, you can die instantly.

First I will clear the data from the table, add a column for the required months between actions, and then reload the table.

The first question is where to check for temporal violations; during insertion or with validation procedures? My answer is both. Whenever possible, do not knowingly put bad data into a schema so this should be done in the ChangeState() procedure. But someone or something will subvert the schema and you have to be able to find and repair the damage.

Here is a query which will tell you what state change in the chain has an improper duration and what the disagreement is.

Inserting a new life change is not a simple matter of putting a (previous_state, current_state, start_date) row into the table. To do it right, you can put conditions into the INSERT INTO statement to cause errors when there is bad data.

A slightly different model will keep a (start_date, expiry_date) pair in the history table. In the case of the MyLife example, the durations were the minimum values for certain changes. But a lot of commercial situations have a fixed lifespan. Warranties, commercial offers and bids expire in a known number of days. This means adding another column to the StateChanges table that tells the insertion program if the expiration date is optional (shown with a NULL) or mandatory (computed from the duration).

Here is some skeleton DDL for a bid application to explain this better.

The DDL has a bid number and the start_date as the primary key and a new column for the expiration date. Obviously, the bid has to exist for a while, so add a constraint to keep the date order right.

The next DDL will extend the state change table to be a bit more flexible (if you wish to test this out, the process to recreate this would be the same as in the previous section).

The DDL for the state changes gets a new column to tell us if the duration is optional or mandatory. The insertion procedure is a bit trickier. The VALUES clause has more power than most programmers use. The list can be more than just literal values or simple scalar variables. But using CASE expressions lets you avoid if-then-else procedural logic in the procedure body.

All it needs is the bid number and what state you want to use. If you don’t give me a previous state, I assume that this is an initial row and repeat the current state you just gave me. If you don’t give me a start date, I assume you want the current date. If you don’t give me an expiration date, I construct one from the StateChanges table with a scalar subquery. Here is the skeleton DDL for an insertion procedure.

There are other tricks to assure that there are no gaps between the rows using DDL, but that is another article for another day.

Article tags

Load comments

About the author

Joe Celko is one of the most widely read of all writers about SQL, and was the winner of the DBMS Magazine Reader's Choice Award four consecutive years. He is an independent consultant living in Austin, TX. He has taught SQL in the US, UK, the Nordic countries, South America and Africa.
He served 10 years on ANSI/ISO SQL Standards Committee and contributed to the SQL-89 and SQL-92 Standards.
He has written over 800 columns in the computer trade and academic press, mostly dealing with data and databases. He is the author of eight books on SQL for Morgan-Kaufmann, including the best selling SQL FOR SMARTIES.
Joe is a well-known figure on Newsgroups and Forums, and he is famous for his his dry wit. He is also interested in Science Fiction.